Learn R Programming

MXM (version 0.9.4)

Conditional independence test for case control data: Conditional independence test based on conditional logistic regression for case control studies

Description

The main task of this test is to provide a p-value PVALUE for the null hypothesis: feature 'X' is independent from 'TARGET' given a conditioning set CS. The pvalue is calculated by comparing a conditional logistic regression model based on the conditioning set CS against a model whose regressor are both X and CS. The comparison is performed through a chi-square test with the appropriate degrees of freedom on the difference between the deviances of the two models. This is suitable for a case control design

Usage

testIndClogit(target, dataset, xIndex, csIndex, dataInfo = NULL, univariateModels = NULL, hash = FALSE, stat_hash = NULL, pvalue_hash = NULL, robust = FALSE)

Arguments

target
A matrix with two columns, the first one must be 0 and 1, standing for 0 = control and 1 = case. The second column is the id of the patients. A numerical variable, for example c(1,2,3,4,5,6,7,1,2,3,4,5,6,7).
dataset
A numeric matrix or a data.frame in case of categorical predictors (factors), containing the variables for performing the test. Rows as samples and columns as features.
xIndex
The index of the variable whose association with the target we want to test.
csIndex
The indices of the variables to condition on.
dataInfo
A list object with information on the structure of the data. Default value is NULL.
univariateModels
Fast alternative to the hash object for univariate test. List with vectors "pvalues" (p-values), "stats" (statistics) and "flags" (flag = TRUE if the test was succesful) representing the univariate association of each variable with the target. Default value is NULL.
hash
A boolean variable which indicates whether (TRUE) or not (FALSE) to use tha hash-based implementation of the statistics of SES. Default value is FALSE. If TRUE you have to specify the stat_hash argument and the pvalue_hash argument.
stat_hash
A hash object (hash package required) which contains the cached generated statistics of a SES run in the current dataset, using the current test.
pvalue_hash
A hash object (hash package required) which contains the cached generated p-values of a SES run in the current dataset, using the current test.
robust
A boolean variable which indicates whether (TRUE) or not (FALSE) to use a robustified version of Beta regression. Currently it is not available for this test.

Value

A list including: A list including:

Details

If hash = TRUE, testIndClogit requires the arguments 'stat_hash' and 'pvalue_hash' for the hash-based implementation of the statistic test. These hash Objects are produced or updated by each run of SES (if hash == TRUE) and they can be reused in order to speed up next runs of the current statistic test. If "SESoutput" is the output of a SES run, then these objects can be retrieved by SESoutput@hashObject$stat_hash and the SESoutput@hashObject$pvalue_hash.

Important: Use these arguments only with the same dataset that was used at initialization.

For all the available conditional independence tests that are currently included on the package, please see "?CondIndTests".

References

Ferrari S.L.P. and Cribari-Neto F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7): 799-815.

See Also

SES, testIndLogistic, censIndCR, censIndWR

Examples

Run this code
#simulate a dataset with continuous data
dataset <- matrix(rnorm(300 * 100), nrow = 300 ) 
#the target feature is the last column of the dataset as a vector
case = rbinom(300, 1, 0.6)
ina = which(case==1)
ina = sample(ina, 100)
case[-ina] = 0 
id = rep(1:100,3)
target = cbind(case, id)

results <- testIndClogit(target, dataset, xIndex = 44, csIndex = 60)
results

#require(gRbase)  #for faster computations in the internal functions

#run the SES algorithm using the testIndClogit conditional independence test
a1<- SES(target, dataset, max_k = 3, threshold = 0.05, test = "testIndClogit");
a2<- MMPC(target, dataset, max_k = 3, threshold = 0.05, test = "testIndClogit");
#print summary of the SES output
summary(a1);
#plot the SES output
plot(a1, mode = "all");

Run the code above in your browser using DataLab